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ABSTRACT 

A new broad absorption line quasar (BAL) sample is derived from the first 
data released by the Sloan Digital Sky Survey. With 116 objects, it is the largest 
BAL sample yet assembled. Over the redshift range 1.8 < z < 3.8, the crude 
fraction with broad absorption in the C IV line is ~ 15%. This fraction may be 
subject to small selection-efficiency adjustments. There are also hints of redshift- 
dependence in the BAL fraction. The sample is large enough to permit the first 
estimate of the distribution of "balnicity index": subject to certain arbitrary 
parameters in the definition of this quantity, it is very broad, with (roughly) 
equal numbers of objects per logarithmic interval of balnicity. BAL quasars are 
also found to be redder on average than non-BAL quasars. The fraction of radio- 
loud BAL quasars is (weakly) consistent with the fraction of radio-loud ordinary 
quasars. 

Subject headings: quasars: absorption lines, galaxies:active 

1. Introduction 



Broad Absorption Line quasars (BALs) are one of the most enigmatic varieties of 
quasars. Resonance lines of ordinary ions — H I, C IV, N V, O VI, Mg II, and others — are seen 
in absorption that spreads, often in highly irregular fashion, as much as 60,000 km s~^ from 
line-center in the quasar rest-frame to the blueward. Previous surveys (e.g. the Large Bright 
Quasar Survey, or LBQS: Weymann et al. (1991)) have shown that BALs, while a minority 
of all quasars, are not rare; a population fraction ~ 10% is typically estimated. Because few 
of their other properties are grossly different from ordinary quasars, it is generally thought 
that all quasars have BAL material, but it covers only a fraction of solid angle around the 
quasar nucleus (Weymann et al. 1991). However, subtleties of selection can complicate 
the inference of covering fraction from population fraction (Goodrich 1997; Krolik & Voit 
1998). 
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Numerous technical difficulties have retarded growth in our understanding of BALs. 
Known cases are relatively rare, numbering less than ~ 100, not solely because they are a 
minority of the general population but also because they are readily found only when their 
characteristic features are red-shifted from their rest-frame wavelengths in the ultraviolet 
into the visible band. Consequently, only those quasars found in somewhat special redshift 
intervals can be easily searched for broad absorption. It is hard to statistically character- 
ize those BALs that are found because the methods used to discover them often involve 
some level of subjectivity that is hard to quantify. Even if their selection were easier to 
articulate, there appears to be so much variation in their properties (profile shapes, relative 
line strengths, etc.) that it is hard to grasp which properties are generic and which are 
"accidental" . 

The quasar sample being compiled by the Sloan Digital Sky Survey (SDSS: York et al. 
(2000)) offers a way out of this impasse. When complete, it will be both very large (~ 10^ in 
all) and selected in a uniform and quantifiable manner. In future work, we hope to present 
statistical analysis of BALs in this entire sample. Here we offer a preliminary installment on 
this project in the form of a more modest BAL sample drawn from the first data released 
from the project to public view, the Early Data Release (EDR: Stoughton et al. (2002)). 

Several collections of BALs have already been drawn from early Sloan data (Menou et 
al. 2001; Hall ct al. 2002); these were, however, oriented toward "by-eye" selection of 
small subsamplcs special in some way (radio-loud in the former case, extraordinary profiles 
in the latter). The work reported here differs in that it is the first attempt to create a 
systematically-selected sample from the SDSS. 

From the EDR, Schneider et al. (2002) created a quasar catalog containing 3814 quasars, 
selected (mostly) on the basis of their location in four-color space and on a (mostly) uniform 
i-magnitude limit. In order to present more clearly-defined statistics, we have refined this 
sample so that it is almost homogeneously-selected (sec §2.1). Within that sub-sample (about 
80% of the full EDR quasar catalog), roughly one-quarter (796) fall within the redshift range 
within which it is feasible to search for C IV BAL features. 

With an eye toward the homogeneity of selection to be achieved in the full SDSS, 
we invented an automated BAL selection algorithm that processes SDSS spectral data in 
a uniform way and identifies BAL quasars in a uniform manner (see §2.2). Using this 
algorithm, we have identified 116 BAL quasars, whose statistical properties are discussed in 
§3. Although the EDR represents a tiny fraction of the ultimate SDSS quasar sample, the 
BAL sample so derived is now the largest (as well as the most homogeneously selected) such 
sample known. 
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2. Details of Sample Selection 

2.1. Quasar selection 

The EDR quasar catalog (Schneider et al. 2002) was compiled by applying several 
different selection criteria (see Stoughton et al. (2002) for details). Most of its objects were 
chosen on the basis of colors lying outside the "stellar locus" in the four-color space formed 
by the Sloan five- filter {u, g, r, i, and z) photometry^. However, roughly 20% of the quasars 
in this list were chosen in a much less well-defined fashion (in the jargon of the SDSS, these 
were selected for the "serendipity" sample). In addition, much smaller numbers of quasars 
were selected not on the basis of their photometric colors, but because they were close in 
the sky to known radio sources in the FIRST catalog (Becker et al. 1995) or X-ray sources 
in the Rosat All-Sky Survey (Voges et al. 1999). For the purposes of this paper, which 
concentrates on statistics, we have pruned the EDR quasar catalog to include only those 
flagged as quasar candidates by a color-based targetting algorithm. 

Because the EDR data were compiled during the test-year of the project, the color- 
based selection algorithm was not the same throughout. With regard to issues relevant here, 
the variations can be reduced to two slightly different versions, very nearly equal in sky 
coverage. In both, the primary flux limit was i < 19 mag ^. Both versions also shared the 
same primary color criterion: select only those objects whose colors lie at least away from 
the stellar locus. An exception was made to this rule in order to cope with the fact that at 
redshifts z ~ 2.5 - 3.0, quasars have colors nearly indistinguishable from those of A and/or 
F type stars (Fan 1999; Richards et al. 2001). In this portion of the stellar locus, quasar 
candidates were selected, but at lower efficiency (Stoughton et al. 2002; G.T. Richards, 
private communication) . 

The two versions differed in two ways: One version rejected objects with colors approxi- 
mating those of A stars; in the other, objects with colors similar to those of hot white dwarfs 
or unresolved M dwarf- white dwarf pairs were also removed from the quasar candidate list. 
In addition, in those runs in which white-dwarf-like colors were rejected, a special proce- 
dure was adopted in order to enhance sensitivity to quasars with z > 3.5. To find more of 



^Technically, because the photometry published in the EDR had not received final calibration, the magni- 
tudes were shown as i* , etc. In this paper, all such magnitudes should be understood in that sense, although 
we will forgo making the distinction explicit. 

^Objects were also required to be fainter than a limit chosen to avoid image saturation and cross-talk 
between spectroscopic fibers. This was i = 15.0 mag in some cases, i = 16.5 mag in others; because so few 
objects are near the bright limit, the change makes essentially no difference to sample statistics. 
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these quasars, the magnitude limit was relaxed to i = 20 in the region of four-color space 
where previously-located high-redshift quasars were found. We will show below that these 
variations in quasar candidate selection had negligible effects on BAL quasar discovery. 

Quasar candidates were labelled as "quasars" by the spectroscopic pipehne if the cross- 
correlation between their spectra and a quasar template spectrum was greater than the 
crosscorrelation with any of the other templates (stars, galaxies, etc.). For confirmation, 
objects were required to pass two further tests: that their spectra possess at least one emis- 
sion line with FWHM > 1000 km s~^; and that their absolute magnitude Mj < —23 (for 
Hp, = 50 km s~^ and qo = 0.5). 

In the full EDR quasar catalog, there were 3814 objects. Our sample contains only the 
3107 identified by one of the two versions of the color-selection rules. 

2.2. BAL identification 

The C IV hue is centered at 1550 A in the rest-frame, so it appears in the SDSS 
spectra (which nominally cover the wavelength range from 3900-9100 A) only for redshifts 
1.5 < z < 4.9. However, several effects limit this range further. First, although the nominal 
blue cut-off is 3900 A, in practice, throughput, and therefore signal/noise, drop sharply 
shortward of ~ 4100 A and (more gradually) longward of ^ 8000 A. Moreover, in order to 
measure possible absorption to the blue of line-center, we must be able to see some line-free 
continuum to the redward of the emission line-center and also follow the line far enough to the 
blue that we are confident we have defined the entire absorption profile. These requirements 
restrict the permissible range of redshifts to roughly 1.8 < z < 3.8, cutting our sample size 
to 796 objects. 

Redshifts supplied by the SDSS spectroscopic pipeline are typically accurate to ~ 
1000 km s~^ , which suffices for the cut described in the previous paragraph, but is not accu- 
rate enough for absorption line measurement. These measurements require greater accuracy 
because the classical definition of BALs (Weymann et al. 1991) counts only absorption at 
least 3000 km s"^ to the blue of fine-center in the rest- frame. 

The C IV emission line cannot be used to define the redshift to this level of accuracy 
because the very absorption we are interested in can cut into the emission line so severely 
that it is unclear where line-center occurs. Instead, we define the quasar rest-frame in terms 
of the C III] 1909 emission line^. To determine the redshift this way, we fit a Gaussian plus a 



^According to Vanden Berk et al. (2001), this line is, on average, offset only ~ 200 km s ^ from the 
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linear component to the measured flux in the (pipeline-redshift) rest-frame wavelength range 
1860 - 1960 A. The center of the Gaussian in the best fit we take to define the true observed 
wavelength for rest-frame 1909 A. 

To search for absorption — which can be extremely difficult in these objects, in which 
emission and absorption features can occupy most of the spectrum — one must first locate 
the continuum. Our solution to this problem is to first fit a power-law to the continuum 
data in five line-free windows: 1790-1820 A, 1975-2000 A, 2140-2155 A, 2240-2255 A, and 
2265-2695 A. Holding that component fixed, wc then fit a half-Gaussian to the red half of 
the C IV emission feature lying above the fitted continuum, taking line-center as 1549.5 A 
in the rest-frame (this wavelength derived by equally weighting the two components of this 
doublet) . We dehberately ignore the blue half of the emission line so as to avoid confusion 
by absorption; the continuum windows are chosen so as to avoid contamination by the 
He II 1640, C III] 1909, and Mg II 2800 emission fines, as well as from various Fe II emission 
complexes. We then extrapolate the power-law portion of this fit to define the continuum 
blueward of 1550 A. 

The final step in our procedure is to compute the "balnicity index" for each quasar in 
this redshift range, following the definition given in Weymann et al. (1991): 

/-3000 / F \ 
dv (l ^ C, 
-25,000 V 0.9CxJ 

where the measured flux per unit wavelength is Fx, the extrapolated fltted continuum is Cx, 
and C is a function whose value is unity when the quantity between parentheses has been 
positive for at least 2000 km s^^ to the red of the current wavelength and zero otherwise. The 
lower-limit on the integral is designed to avoid confusion with the Si IV line, the upper-limit 
to exclude associated absorption. Comparison is made to 0.9 times the continuum rather 
than the full continuum to be conservative with respect to noise features. The function C 
ensures that only truly broad features are counted. In effect, the balnicity index amounts 
to a sort of equivalent width. Following Weymann et al., we declare a quasar to be a BAL 
when its balnicity index is greater than zero. 



systemic host redshift as defined by O III[5007]. For similar reasons, Weymann et al. (1991) used a weighted 
mean of the C IV, C III, and Mg II emission lines to determine their redshifts. 



(1) 
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3. Results 

In view of the preliminary nature of this sample, here we present only a few tentative 
results. The statistical and systematic uncertainties in the numbers presented here will be 
substantially reduced in the far larger full SDSS sample; that sample will also enable other, 
more detailed studies. 



3.1. BAL fraction 

Of the 796 quasars in the redshift range where we can search for BALs, we find that 
116, or ~ 15% are C IV BALs. The apparent BAL fraction in this sample is also strongly 
dependent upon redshift (fig. 1): it is only ~ 10% in the redshift range 1.8 < 2; < 2.2, but 
rises to ~ 33% near z ~ 2.7, and averages ~ 20% for 2.2 < 2; < 3.8. 

Some of this redshift dependence may be real, but there are also redshift-dependent 
systematic effects. Near z ~ 1.8, some BALs may be lost due to the relatively poor S/N 
at the blue end of the spectrograph. In the range 2.5 < < 3, distinguishing quasars and 
stars by color becomes difficult. Both the broad absorption itself and intrinsic differences in 
continuum shape (§3.3) can give BALs colors different from ordinary quasars; their selection 
efficiencies can therefore differ significantly in this redshift range. The spike in the "raw" 
BAL fraction near 2; ~ 2.7 may be the result of this differential selection efficiency (see 
Reichard et al. (2003) for further exploration of this issue). 

On the other hand, dividing the sample according to the color-selection procedure used, 
we find negligible differences. Both BAL and ordinary quasar selection efficiencies were equal 
to well within Poisson errors. 

The LBQS found a significantly smaller "raw" fraction (9%), but Weymann et al. cor- 
rected this figure to ~ 12% because the BAL itself removed enough flux that many BAL 
quasars dropped below the survey flux-limit. In the SDSS, by contrast, the flux-limit is 
applied in the i-band, near 8500 A. Only for redshifts ~ 4 would a C IV 1550 BAL influence 
the i-band flux, and we do not even consider such high-redshift quasars in the sample at 
hand. Consequently, the SDSS BAL fraction needs to be corrected for this effect only at 
very high redshift or for the special case of "LoBALs" , BAL quasars with absorption in the 
Mg II 2800 line. When Mg II absorption is present, the BAL would remove flux from the 
i-band when z ~ 2.5. 

Overall, then, particularly for z > 2.2, the SDSS appears to find a somewhat larger BAL 
fraction than the LBQS. Allowing for the various systematic errors, our best preliminary 
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Fig. 1. — BAL fraction in the sample as a function of redshift. Errorbars are la, and purely 
statistical. 
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estimate is a BAL fraction ~ 15 - 20% for this redshift range, but possibly nearer 10 - 
12% for 1.8 < z < 2.2. We expect these numbers to be refined as the statistics improve 
and the systematic effects become better understood. Using the full SDSS quasar sample 
it should become possible to search for genuine redshift- (or luminosity-) dependence in the 
BAL fraction. 



3.2. Balnicity index distribution 

The BI distribution for our sample is shown in Figure 2. The plot shows log[dN / d\og{B I)] — 
log[BIdN/d{BI)]; within the loose constraints placed by relatively small sample size, there 
arc roughly equal numbers of objects in equal logarithmic bins. However, we stress that 
the shape of this distribution below ~ 1000 km s^^ is strongly dependent upon the arbi- 
trary velocity offset parameter used in the definition of balnicity to distinguish "associated 
absorbers" from truly "broad" absorption (cf. the discussion in Hall et al. (2002)). 

The shape of this distribution has several interesting implications. First, because BALs 
of very small Bl are common, the arbitrariness of the velocity offset parameter means that 
the distinction between weak BALs and "associated absorbers" is difficult to mark and the 
nominal BI for these objects hkely underestimates the "physical" absorption. Second, if we 
take the definition of balnicity at face value, the breadth of its distribution is consistent 
with the anecdotal sense of the diversity of BAL profiles derived from previous, smaller 
samples. Third, the shallow slope of the distribution at the high-balnicity end suggests that 
the maximum velocity width of BALs is probably as yet ill-defined. 

3.3. Colors of BALs 

The mean quasar color is a strong function of redshift (Fan 1999; Richards et al. 2001). 
To contrast the colors of BALs and non-BAL quasars most clearly, in Figure 3 we show the 
distribution of colors after subtracting the mean color for our sample at each quasar's redshift. 
Particularly in u — g, the distribution of BAL colors is shifted distinctly to the red (see also 
Menou et al. (2001)). In the mean, the g — r color difference is about 0.18 mag; in -u — 
it is about 0.34 mag. Both color offsets are crudely constant with redshift for 1.8 < z < 3. 
This trend is in the same sense (although somewhat smaller than) the color contrast in the 
FIRST survey (Becker et al. 2000; Brotherton et al. 2001), in which radio-selected BALs 
were, on average, ~ 0.5 mag redder (in a color roughly equivalent to B — R) than their 
radio-selected non-BAL quasars. 
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Fig. 2. — Distribution of BI in the sample (solid histogram). The binning is logarithmic in 
BI. Errorbars are la. 
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Fig. 3. — The distribution of colors after subtracting off the mean color of quasars at the 
individual quasars' redshifts. Solid line is non-BAL quasars, dashed line is BAL quasars. The 
left-hand panel is the distribution for normalized u — g; the right-hand panel for normalized 
g-r. 
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There are several possible explanations for this effect. It may be, for example, that 
there is dust associated with the absorbing material itself. It is also possible that the redder 
colors are due to dust, but farther from the nucleus along our line-of-sight (e.g., as proposed 
by Goodrich (1997)). On the other hand, BAL quasars may differ from ordinary quasars 
in some intrinsic fashion, perhaps having a different mean ratio of luminosity to Eddington 
luminosity. Alternatively, we may view them from a special angle. If the optical continuum 
is generated in an accretion disk, one might expect both that the continuum shape would 
depend on L/Le and that there is wavelength-dependent limb-darkening (e.g., Hubeny et 
al. (2000)). The latter effect would create a systematic color offset if the absorbing matter 
lies in a special direction relative to the disk. 

That BAL quasars are preferentially redder than ordinary quasars affects our ability 
to find them. The reason our BAL fraction is larger than the fraction in the LBQS may 
be that quasars in the LBQS were selected in part on the basis of blue colors. If so, the 
comparative lack of color bias in the SDSS may be critical for obtaining a fair estimate of 
the size and character of the BAL population. In addition, if the redder colors are also 
associated with continuum fiux that is weaker in the BAL direction, the fraction of the sky 
around the nucleus covered by BAL material would be larger than the population fraction 
of BALs (Goodrich 1997; Krolik & Voit 1998). 



3.4. Radio- loud fraction 

We close with a brief comment about the radio properties of these BAL quasars. Wey- 
mann et al. (1991) found that none of their BALs was radio-loud and therefore suggested 
an anti-correlation between the two properties. On the other hand, Becker et al. (2000) 
argued that, if anything, BAL quasars were more likely to be found in radio-selected than 
in optically-selected samples. 

Radio-loudness is often defined as 7? = F,j{b GHz)/F,^(4400 A) > 10 - 30. This criterion 
can be applied to only about 4% of the SDSS quasars because radio data are available for 
only those brighter at 1.4 GHz than the FIRST fiux limit. Given the optical flux limit of 
the SDSS quasar sample, essentially all those quasars in the EDR detected by FIRST are 
radio- loud by this criterion^. Five of the 116 BALs found in our sample are radio-loud by 



^Comparing this radio-loud fraction to the ~ 15% found by Kellermann et al. (1989) in the PG sample 

suggests that many radio-loud quasars in the SDSS arc a little too faint to have been detected by FIRST. 
Ivezic et al. (2002) estimate a radio-loud fraction of ~ 8%; relative to this fraction, there are still numerous 
radio-loud quasars in this sample that must fall just below the FIRST detection limit. 
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this definition. Our results are therefore consistent with the proposition that there is no 
difference between the radio-loud BAL fraction and the radio-loud fraction among ordinary 
quasars. However, in view of the very small number of objects and the incompleteness of 
radio data for our sample, this conclusion must be tentative at best. 

We thank Tim Heckman, Gordon Richards, and Tim Reichard for numerous helpful 
conversations and suggestions. J.H.K. was partially supported by NASA grant NAG5-9187. 
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